IMAGE PROCESSING APPROACH FOR EXTRACTING TABLES FROM SCANNED DOCUMENTS
DOI:
https://doi.org/10.17605/OSF.IO/RN8WKKeywords:
Image Processing, Optical Character RecognitionAbstract
Due to data revolution in the 21st century, processing the ever-increasing volume of documents has become essential. Most of the data in the banking, financial and administrative disciplines is still stored on physical documents. There is a great necessity to process these documents using automation. A majority of useful data in these documents is stored in the form of tables. To maintain the value of data extracted, the data from tables needs to be extracted by maintaining the tabular structure. We have used an image processing approach for extracting these tables and the data contained in them. We perform operations on scanned documents to identify rows and columns of the table. We then extract the textual data using Optical Character recognition from each cell of the table. We used this approach for extracting bordered tables and achieved more than 90% accuracy in extracting the tabular data accurately.
Downloads
References
Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig, “TableNet: Deep Learning model for end-to-end table detection and tabular data extraction for scanned document images”/
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, Sheraz Ahmed, “Deepdesrt: deep-learning for detection and structure recognition of tables in document images”/
https://www.dfki.de/fileadmin/user_upload/import/9672_PID4966073.pdf
Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, Stavros J. Perantonis, “Automatic Table detection in document images”/
https://www.researchgate.net/publication/220781373_Automatic_Table_Detection_in_Document_Images
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhouiun Li, “Tablebank: Table benchmark for image-based table detection and recognition”/
https://arxiv.org/abs/1903.01949
Aditya Kekare, Abhishek Jachak, Atharva Gosavi, P.S. Hanwate, “Techniques for detecting and extracting tabular data from PDFs and scanned documents: A survey”/
https://www.irjet.net/archives/V7/i1/IRJET-V7I178.pdf
S. Deivalakshmi, K. Chaitanya, P. Palanisamy, “Detection of table structure and content extraction from scanned documents”/
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.